Back

Nature Biotechnology

Springer Science and Business Media LLC

Preprints posted in the last 7 days, ranked by how well they match Nature Biotechnology's content profile, based on 147 papers previously published here. The average preprint has a 0.34% match score for this journal, so anything above that is already an above-average fit.

1
GRASP: Gene-relation adaptive soft prompt for scalable and generalizable gene network inference with large language models

Feng, Y.; Deng, K.; Guan, Y.

2026-04-14 bioinformatics 10.1101/2025.10.20.683485 medRxiv
Top 2%
4.9%
Show abstract

Gene networks (GNs) encode diverse molecular relationships and are central to interpreting cellular function and disease. The heterogeneity of interaction types has led to computational methods specialized for particular network contexts. Large language models (LLMs) offer a unified, language-based formulation of GN inference by leveraging biological knowledge from large-scale text corpora, yet their effectiveness remains sensitive to prompt design. Here, we introduce Gene-Relation Adaptive Soft Prompt (GRASP), a parameter-efficient and trainable framework that conditions inference on each gene pair through only three virtual tokens. Using factorized gene-specific and relation-aware components, GRASP learns to map each pair's biological context into compact soft prompts that combine pair-specific signals with shared interaction patterns. Across diverse GN inference tasks, GRASP consistently outperforms alternative prompting strategies. It also shows a stronger ability to recover unannotated interactions from synthetic negative sets, suggesting its capacity to identify biologically meaningful relationships beyond existing databases. Together, these results establish GRASP as a scalable and generalizable prompting framework for LLM-based GN inference.

2
Single-molecule cfDNA sequencing establishes clinical utility for ecDNA monitoring and multimodal liquid biopsy analysis

Sauer, C. M.; Tovey, N.; Ptasinska, A.; Hughes, D.; Stockton, J.; Zumalave, S.; Rust, A. G.; Lynn, C.; Livellara, V.; Sevrin, F.; Himsworth, C.; Muyas, F.; Nicolaidou, M.; Parry, G.; Paisana, E.; Cascao, R.; Ahmed, S. W.; Yasin, S. A.; Portela, L. R.; Balasubramanian, P.; Burke, G. A. A.; Vedi, A.; Faria, C. C.; Marshall, L. V.; Jacques, T. S.; Hubank, M.; Hargrave, D.; George, S.; Angelini, P.; Anderson, J.; Chesler, L.; Beggs, A. D.; Cortes-Ciriano, I.

2026-04-12 oncology 10.64898/2026.04.08.26350410 medRxiv
Top 3%
2.8%
Show abstract

Cell-free DNA (cfDNA) profiling enables minimally invasive cancer detection and monitoring. We present SIMMA, a low-input single-molecule sequencing approach that enables multimodal whole-genome and high-depth targeted sequencing of the same cfDNA sample for both tumour-agnostic and tumour-informed liquid biopsy analysis. Across 792 plasma and cerebrospinal fluid cfDNA samples from 277 paediatric patients with diverse brain and extracranial tumours, SIMMA enabled tumour diagnosis, detection of driver mutations, and reconstruction of extrachromosomal DNA (ecDNA) months before clinical relapse. Using conformal prediction trained on genome-wide fragmentomics, genomic and epigenomic data, SIMMA predicts disease burden as a continuous variable and provides well-calibrated uncertainty estimates for each sample, achieving a limit of detection of [~]100 ppm from low-pass whole-genome sequencing data. In summary, SIMMA establishes the clinical utility of multimodal cfDNA profiling with uncertainty quantification for individual patients and unlocks the potential of ecDNA as a liquid biopsy biomarker for disease detection and monitoring across diverse aggressive malignancies.

3
Dynamic Quantum Clustering of Gliomas RNA-seq Identifies Diagnostic Separation and Survival Gradients

Jahaniani, F.; Schrodi, S. J.; Weinstein, M.

2026-04-10 genetic and genomic medicine 10.64898/2026.04.09.26350535 medRxiv
Top 4%
2.1%
Show abstract

Public RNA-seq sample sets can refine per tumor diagnosis and risk, but heterogeneous biology and analytic drift often obscure structure. Dynamic Quantum Clustering (DQC), an unsupervised geometry-preserving method requiring no clinical labels or preset cluster counts, addresses both challenges. Applied to RNAseq from 692 TCGA gliomas (524 low-grade gliomas (LGG), 168 glioblastomas (GBM); 20,057 protein coding genes), DQC produced two dominant clusters with 90.9% post hoc diagnostic concordance and clear survival time separation. Filtering genes by inter-cluster mean differences yielded a 554 gene subset that improved accuracy to 97.3%. Rank ordering these genes identified ~90 genes that, under DQC, produced three LGG-pure subclusters with ordered, but different survival outcomes and one GBM-rich cluster (PPV 97.1%)--the RNA-based clustering without clinical information thereby inherently reveals molecular groupings which mirror critically important clinical features. Comparing these clusters defined four nonoverlapping gene modules and assigned four BioCoords per tumor. DQC with Biocoords recapitulated the LGG-to-GBM continuum with a mesenchymal/invasion-extracellular matrix axis exhibiting a monotonic survival gradient, illustrating how geometry-aware unsupervised learning can translate bench and computational discovery into meaningful biology-based patient stratification and prognosis.

4
Efficient generation of epitope-targeted de novo antibodies with Germinal

Mille-Fragoso, L. S.; Driscoll, C. L.; Wang, J. N.; Dai, H.; Widatalla, T. M.; Zhang, J. L.; Zhang, X.; Rao, B.; Feng, L.; Hie, B. L.; Gao, X. J.

2026-04-15 synthetic biology 10.1101/2025.09.19.677421 medRxiv
Top 4%
2.1%
Show abstract

Obtaining novel antibodies against specific protein targets is a widely important yet experimentally laborious process. Meanwhile, computational methods for antibody design have been limited by low success rates that currently require resource-intensive screening. Here, we introduce Germinal, a broadly enabling generative pipeline that designs antibodies against specific epitopes with nanomolar binding affinities while requiring only low-n experimental testing. Our method co-optimizes antibody structure and sequence by integrating a structure predictor with an antibody-specific protein language model to perform de novo design of functional complementarity-determining regions (CDRs) onto a user-specified structural framework. When tested against four diverse protein targets, Germinal successfully designed functional antibodies across all targets and binder formats, testing only 43-101 designs for each antigen. Validated designs also exhibited robust expression in mammalian cells and high sequence and structural novelty. We provide open-source code and full computational and experimental protocols to facilitate wide adoption. Germinal represents a milestone in efficient, epitope-targeted de novo antibody design, with notable implications for the development of molecular tools and therapeutics.

5
Vector2Variant: Discovery of Genetic Associations from ML Derived Representations without Phenotype Engineering

Sooknah, M.; Srinivasan, R.; Sankarapandian, S.; Chen, Z.; Xu, J.

2026-04-17 genetic and genomic medicine 10.64898/2026.04.10.26350624 medRxiv
Top 5%
1.6%
Show abstract

Genome-wide association studies (GWAS) have transformed our understanding of human biology, but are constrained by the need for predefined phenotypes. We introduce Vector2Variant (V2V), a general-purpose framework that transforms any set of high-dimensional measurements (such as machine learning embeddings) into a genome-wide scan for associations, without requiring rigid specification of a phenotype. Rather than testing genetic variants against single traits, V2V finds the axis in multivariate space along which carriers and non-carriers maximally differ, and produces a continuous "projection phenotype" that can be interpreted by association with disease labels. The projection phenotypes correlate with orthogonal clinical biomarkers never seen during training, suggesting the learned axes capture biologically meaningful variation. We applied V2V to imaging, timeseries, and omics modalities in the UK Biobank and recovered established biology (like the role of CASP9 in renal failure) without the need for targeted measurements, alongside novel associations including a frameshift variant in LRRIQ1 (potentially protective for cardiovascular disease). V2V is computationally efficient at genome-wide scale, producing summary statistics and disease associations that facilitate target prioritization without the need for phenotype engineering.

6
Why Invariant Risk Minimization Fails on TabularData: A Gradient Variance Solution

Mboya, G. O.

2026-04-13 epidemiology 10.64898/2026.04.09.26350513 medRxiv
Top 6%
0.9%
Show abstract

Machine learning models trained on observational data from one environment frequently fail when deployed in another, because standard learning algorithms exploit spurious correlations alongside causal ones. Invariant learning methods address this problem by seeking representations that support stable prediction across training environments, but their behavior on tabular data remains poorly characterized. We present CausTab, a gradient variance regularization framework for causal invariant representation learning on mixed tabular data. CausTab penalizes the variance of parameter gradients across training environments, providing a richer invariance signal than the scalar penalty used by Invariant Risk Minimization (IRM). We provide formal results showing that the gradient variance penalty is zero at causally invariant solutions and positive at solutions that rely on spurious features. Through experiments on synthetic data across three spurious-correlation regimes, four cycles of the National Health and Nutrition Examination Survey (NHANES), and four hospital systems in the UCI Heart Disease dataset, we demonstrate that: (1) IRM consistently degrades relative to standard empirical risk minimization (ERM) on tabular data, losing up to 13.8 AUC points in spurious-dominant settings, a failure we trace mechanistically to penalty collapse during training; (2) CausTab matches or exceeds ERM in every experimental condition; (3) CausTab achieves consistently better probability calibration than both ERM and IRM; and (4) invariant learning methods fail when environments differ in outcome prevalence rather than in spurious feature correlations, a boundary condition we characterize both empirically and theoretically. We introduce the Spurious Dominance Index (SDI), a practical scalar diagnostic for determining whether a dataset requires invariant learning, and validate it across all experimental settings

7
De novo designed bifunctional proteins for targeted protein degradation

Mylemans, B.; Korona, B.; Acevedo-Jake, A. M.; MacRae, A.; Edwards, T. A.; Huang, D. T.; Wilson, A. J.; Itzhaki, L. S.; Woolfson, D. N.

2026-04-15 synthetic biology 10.64898/2025.12.22.695915 medRxiv
Top 6%
0.9%
Show abstract

Targeted protein degradation (TPD) is a therapeutic strategy to remove disease-causing proteins by routing them to the ubiquitin-proteasome, autophagy, or lysosme machineries. For instance, proteolysis-targeting chimeras (PROTACs) are synthetic hetero-bifunctional small molecules that simultaneously bind the target and an E3 ubiquitin ligase to drive ubiquitination and degradation by the proteasome. Despite considerable success, designing such molecules is challenging and the number of currently addressable ubiquitin E3 ligases is limited. Here we demonstrate hetero-bifunctional de novo designed proteins as alternatives for TPD to access more targets and ligases. First, we develop a stable and highly adaptable helix-turn-helix scaffold for presenting different binding sites. Next, we use computational protein design to incorporate and embellish hot-spot- binding sites to target BCL-xL, plus short linear motifs (SLiMs) for KLHL20 ligase recruitment. The resulting mono- and bi-functionalised proteins bind the targets in vitro, and the latter degrade BCL-xL in cells leading to apoptosis.

8
Fine-Tuning PubMedBERT for Hierarchical Condition Category Classification

Wang, X.; Hammarlund, N.; Prosperi, M.; Zhu, Y.; Revere, L.

2026-04-15 health systems and quality improvement 10.64898/2026.04.13.26350814 medRxiv
Top 7%
0.8%
Show abstract

Automating Hierarchical Condition Category (HCC) assignment directly from unstructured electronic health record (EHR) notes remains an important but understudied problem in clinical informatics. We present HCC-Coder, an end to end NLP system that maps narrative documentation to 115 Centers for Medicare & Medicaid Services(CMS) HCC codes in a multi-label setting. On the test dataset, HCC-Coder achieves a macro-F1 of 0.779 and a micro-F1 of 0.756, with a macro-sensitivity of 0.819 and macro-specificity of 0.998. By contrast, Generative Pre-trained Transformer (GPT)-4o achieves highest score of a macro-F1 of 0.735 and a micro-F1 of 0.708 under five-shot prompting. The fine-tuned model demonstrates consistent absolute improvements of 4%-5% in F1-scores over GPT-4o. To address severe label imbalance, we incorporate inverse-frequency weighting and per-label threshold calibration. These findings suggest that domain-adapted transformers provide more balanced and reliable performance than prompt-based large language models for hierarchical clinical coding and risk adjustment.

9
Drug response profiling guides precision therapy in relapsed and refractory childhood acute lymphoblastic leukemia

Steffen, F. D.; Lissat, A.; Alten, J.; Kriston, A.; Scheidegger, N.; Eckert, C.; Bodmer, N.; Schori, L.; Schühle, S.; Arpagaus, A.; Gutnik, S.; Manioti, D.; Bruderer, N.; Zeckanovic, A.; Västrik, I.; Nyiri, G.; Kovacs, F.; Thorhauge Als-Nielsen, B. E.; Attarbaschi, A.; Rademacher, A.; Elitzur, S.; Jacoby, E.; De Moerloose, B.; Svenberg, P.; Ancliff, P.; Sramkova, L.; Buldini, B.; Balduzzi, A.; Boer, J. M.; Mielcarek, M.; Ceppi, F.; Ansari, M.; Halter, J.; Schmiegelow, K.; Locatelli, F.; DelBufalo, F.; Stanulla, M.; Kulozik, A. E.; Schrappe, M.; Rohrlich, P.; Cave, H.; Baruchel, A.; von Stack

2026-04-11 oncology 10.64898/2026.04.08.26350164 medRxiv
Top 8%
0.7%
Show abstract

Children with relapsed or refractory acute lymphoblastic leukemia (ALL) require more effective and less toxic therapies. We established a prospective, multicenter Drug Response Profiling (DRP) registry (NCT06550102) integrating functional testing into precision-guided treatment. DRP was performed for 340 patients from 17 European countries with a turn-around time of two-weeks. Image-based drug screening with over 135000 unique perturbations revealed a heterogeneous landscape of ex vivo responses to 88 drugs on average. Ranking drug responses across the patient cohort defined individual drug fingerprints, identifying "DRP twins" by similarity in sensitivity and resistance independent of genetic ALL subtypes. Of 239 high-risk patients with follow-up, DRP-informed interventions were reported for 63 patients (26%). Patients received combination therapies based on venetoclax, tyrosine kinase inhibitors, trametinib, bortezomib or selinexor, resulting in objective clinical responses in 43 cases (68%). Precision-guided treatments allowed bridging to cellular therapies in 42 patients among whom 28 (67%) were still alive with a median follow-up of 21 months after DRP (IQR: 14.7-26.6 months). Top responders to venetoclax, ranked within the first tertile of the cohort, had superior 1-year event-survival compared to venetoclax non-responders (0.57 [95% CI, 0.39-0.85] vs. 0.25 [95% CI, 0.11-0.58]). Collectively, these findings demonstrate the feasibility and clinical relevance of functional profiling within an international network. This scalable framework enables individualized therapy selection for enrolment in adaptive precision trials for high-risk pediatric ALL.

10
SARS-CoV-2 Introductions into Lao PDR Revealed by Genomic Surveillance, 2021-2024

Panapruksachat, S.; Troupin, C.; Souksavanh, M.; Keeratipusana, C.; Vongsouvath, M.; Vongphachanh, S.; Vongsouvath, M.; Phommasone, K.; Somlor, S.; Robinson, M. T.; Chookajorn, T.; Kochakarn, T.; Day, N. P.; Mayxay, M.; Letizia, A. G.; Dubot-Peres, A.; Ashley, E. A.; Buchy, P.; Xangsayarath, P.; Batty, E. M.

2026-04-13 epidemiology 10.64898/2026.04.09.26349480 medRxiv
Top 8%
0.6%
Show abstract

We used 2492 whole genome sequences from Laos to investigate the molecular epidemiology of SARS-CoV-2 from 2021 through 2024, covering the major waves of COVID-19 disease in Laos including time periods of travel restrictions and after relaxation of travel across international borders. We identify successive waves of COVID-19 caused by shifts in the dominant lineage, beginning with the Alpha variant in April 2021 and continuing through the Delta and Omicron variants. We quantify a shift from a small number of viral introductions responsible for widespread transmission in early waves to a larger number of introductions for each variant after travel restrictions were lifted, and identify potential routes of introduction into the country. Our study underscores the importance of genomic surveillance to public health responses to characterize viral transmission dynamics during pandemics.

11
Location-, intensity-, and frequency-optimized epidural stimulation restores hand function after complete spinal cord injury

Oh, J.; Steele, A. G.; Scheffler, M.; Martin, C.; Sheynin, J.; Dietz, V. A.; Valdivia-Padilla, A.; Stampas, A.; Korupolu, R.; Karmonik, C.; Hodics, T. M.; Freyvert, Y.; Manzella, M.; Faraji, A. H.; Horner, P. J.; Sayenko, D. G.

2026-04-11 rehabilitation medicine and physical therapy 10.64898/2026.04.07.26349471 medRxiv
Top 9%
0.5%
Show abstract

Cervical spinal cord injury (SCI) causes profound and persistent loss of hand function, and effective neuromodulation strategies remain limited. We report the first-in-human implantation of a 32-contact cervical epidural paddle array in two individuals with severe chronic SCI. Individualized motor pool recruitment maps, derived from systematic bipolar and multipolar configurations, enabled person-specific stimulation parameters. Optimized stimulation restored volitional hand opening, closing and coordinated upper-limb movements that were previously unattainable. This approach achieved a >91% success rate in complex reach-grasp-lift-release sequences, supported by substantial gains in range of motion, grip, and pinch strength. Electrophysiological and kinematic analyses demonstrated parameter-dependent, selective recruitment of flexor and extensor motor pools. Personalized stimulation programs integrated with goal-directed activities enabled functional hand use in home and community settings, sustained over several months of continued autonomous use. These findings establish a mechanistically grounded and translational framework for restoring upper-limb function after chronic severe SCI.

12
Classifying and Differentiating Individuals with Respiratory Syncytial Virus, Influenza, and COVID-19 Cases in OpenSAFELY

Prestige, E.; Warren-Gash, C.; Quint, J. K.; Evans, D.; Costello, R. E.; Mehrkar, A.; Bacon, S.; Goldacre, B.; Barley-McMullen, S.; Yameen, F.; Shah, P.; Natt, M.; Alder, Y.; Hulme, W.; Parker, E. P. K.; Eggo, R. M.

2026-04-13 infectious diseases 10.64898/2026.04.09.26350495 medRxiv
Top 11%
0.4%
Show abstract

Electronic health records (EHRs) are a rich source of data which can be used to analyse health outcomes using computable phenotypes. With the approval of NHS England we used the OpenSAFELY secure analytics platform to design and assess phenotypes to classify three key respiratory viruses - respiratory syncytial virus (RSV), influenza, and COVID-19 - in English coded health data between September 2016 and August 2024. We compared specific and sensitive phenotypes to one another and to publicly available surveillance data. Cases from both phenotypes showed similar seasonal patterns to surveillance data. Sensitive phenotypes led to increased risk of misclassification than specific phenotypes for mild cases. For severe cases the risk of misclassification was higher in infants than for older adults, irrespective of the phenotype used. The phenotypes presented here offer a solution to classifying respiratory viruses from coded health records in the absence of testing information.

13
One Health genomics of Acinetobacter baumannii reveals sector-specific lineages and permeable ecological barriers

Plantade, J.; Escobar, C.; Godeux, A.-S.; Poire, L.; Andre, A.; Deromelaere, V.; Cassier, P.; Rasigade, J.-P.; Nazaret, S.; Coluzzi, C.; Venner, S.; Laaberki, M.-H.; Charpentier, X.

2026-04-11 infectious diseases 10.64898/2026.04.09.26350516 medRxiv
Top 11%
0.3%
Show abstract

Acinetobacter baumannii is a major cause of severe hospital-acquired infections, with a steadily increasing global prevalence driven by a few clinically adapted lineages. Animals and natural environments also harbor A. baumannii populations, but assessing their connections to clinical lineages is limited by sparse genomic data and a lack of integrated sampling. We conducted a local One Health genomic epidemiology study, sampling, isolating, sequencing, and characterizing several hundred A. baumannii isolates from clinical, animal, and environmental contexts. Within a geographically restricted area, we recovered several globally distributed clinical lineages (international clones, ICs), as well as livestock- and environment-associated lineages shared across Europe, highlighting widespread dissemination beyond clinical settings. Isolates closely related to the emerging clinical lineage IC11 were found in livestock, but no other clinically associated lineages were detected outside clinical contexts. Among these, the epidemic superlineage IC2 was identified in both human and veterinary clinical settings, indicating that similar practices in human and animal medicine select for closely related opportunistic pathogens. We found that hospitals host distinct, antibiotic-sensitive endemic populations capable of causing infection. These populations belong to a diversifying clade spanning clinical and environmental contexts and carry a high load of insertion sequences. Strong plasmid conservation further suggests frequent horizontal gene transfer across ecological compartments. Overall, A. baumannii comprises diverse, context-adapted lineages with a high potential for global spread. Although intercontext transmission appears limited, plasmids may overcome these ecological barriers. Our findings underscore the need for integrated One Health surveillance to better understand transmission pathways and limit the emergence of clinically adapted strains.

14
A multidomain intrinsic capacity score tracks longitudinal health trajectories in the UK Biobank

Zhai, T.; Babu, M.; Fuentealba, M.; Al Dajani, S.; Gladyshev, V. N.; Furman, D.; Snyder, M.

2026-04-13 epidemiology 10.64898/2026.04.10.26350621 medRxiv
Top 11%
0.3%
Show abstract

Quantitative measures for tracking functional health have generally been lacking. Intrinsic capacity (IC) has been proposed as an appropriate measure, but its metrics have been derived in small datasets and sparse longitudinal data. Using harmonized measures of cognition, locomotion, sensory function, vitality, and psychological well-being from 501,615 UK Biobank participants and followed for a median of 15.5 years, we derived domain-specific and composite IC scores. We examined associations with incident disease, cause-specific mortality, multimorbidity, lifestyle and socioeconomic factors, and multi-omic profiles from Olink proteomics, NMR metabolomics, clinical biochemistry, and blood-cell traits. We found that composite IC declined non-linearly with age, and within-person decline was steeper than the cross-sectional age measures. Participants with greater baseline morbidity, those who subsequently developed incident disease, and those who died earlier in follow-up showed lower IC trajectories across adulthood. The IC domains were only modestly correlated with one another, supporting multidimensionality, yet higher overall IC was associated with lower risk of most diseases examined. The dominant IC domain varied by endpoint, with cognition informative for dementia, sensory function for hearing loss, psychological capacity for depression, locomotion for osteoarthritis, and vitality for cardiometabolic outcomes. IC was also associated cross-sectionally with physical activity, insomnia, smoking, medication burden, and socioeconomic disadvantage. More proteins were found predictive for vitality, and enrichment converged on immune/inflammatory and metabolic pathways. Blood-based surrogates recapitulated part of the phenotypic signal, particularly for vitality. Overall, this IC framework captures longitudinal health trajectories and broad disease vulnerability in a large middle- to older-aged cohort and supports IC as a clinically meaningful, multidomain phenotype of aging and identifies blood-based correlates that may facilitate at-scale future monitoring of aging-related function declines.

15
Recombinant zoster vaccination in patients with dementia is associated with improved survival and better cognitive preservation

Soltys, K.; Sara-Buchbut, R.; Ish Shalom, N.; Stokar, J.; Klein, B. Y.; Calderon-Margalit, R.; Greenblatt, C. L.; Ben-Haim, M. S.

2026-04-13 epidemiology 10.64898/2026.04.09.26350509 medRxiv
Top 11%
0.3%
Show abstract

Dementia affects tens of millions of people worldwide, yet disease-modifying treatments remain strikingly limited. Although the recombinant zoster vaccine Shingrix has been associated with reduced dementia incidence, its potential influence on individuals already living with dementia is unknown. Here, we followed a propensity-score matched cohort of 68,960 US dementia patients using a nationwide electronic health record network, comparing Shingrix recipients within two years of diagnosis to recipients of any other vaccine. Shingrix was associated with substantially reduced all-cause mortality across the first three years of follow-up (hazard ratios 0.74, 0.88, and 0.89; P[≤]0.006), robust across multiple sensitivity analyses. Furthermore, within-individual subgroup analyses of repeated Mini-Mental State Examinations conducted 3-6 years apart revealed significantly divergent cognitive decline rates across groups (time-by-group interaction P=0.002). Interval vaccination was associated with more stable cognition, contrasting with steeper declines in unvaccinated individuals. These findings support prospective evaluation of recombinant zoster vaccination as a potential strategy to improve outcomes in patients with established dementia.

16
HAARF: Healthcare AI Agents Regulatory Framework - A Comprehensive Security Verification Standard for Autonomous AI Systems in Clinical Environments

Schwoebel, J.; Frasch, M.; Spalding, A.; Sewell, E.; Englert, P.; Halpert, B.; Overbay, C.; Semenec, I.; Shor, J.

2026-04-13 health systems and quality improvement 10.64898/2026.04.09.26350519 medRxiv
Top 12%
0.3%
Show abstract

As health systems begin deploying autonomous AI agents that make independent clinical decisions and take direct actions within care workflows, ensuring patient safety and care quality requires governance standards that go beyond existing medical device frameworks designed for human-in-the-loop prediction tools. This paper introduces the Healthcare AI Agents Regulatory Framework (HAARF), a comprehensive verification standard for autonomous AI systems in clinical environments, developed collaboratively with 40+ international experts spanning regulatory authorities, clinical organizations, and AI security specialists. HAARF synthesizes requirements from nine major regulatory frameworks (FDA, EU AI Act, Health Canada, UK MHRA, NIST AI RMF, WHO GI-AI4H, ISO/IEC 42001, OWASP AISVS, IMDRF GMLP) into eight core verification categories comprising 279 specific requirements across three risk-based implementation levels. The framework addresses critical gaps in health system readiness for autonomous AI including: (1) progressive autonomy governance with clinical accountability, (2) tool-use security for agents that independently access EHRs, medical devices, and clinical systems, (3) continuous equity monitoring and bias mitigation across diverse patient populations, and (4) clinical decision traceability preserving human oversight authority. We validate HAARFs enforcement capabilities through a scenario-based red-team evaluation comprising six adversarial scenarios executed under baseline (no middleware) and HAARF- guardrailed conditions (N = 50 trials each, Gemini 2.5 Flash primary with Claude Sonnet 4.6 cross-model validation). In baseline conditions, the agent model executes unauthorized tools in 56-60% of adversarial trials. Under the HAARF condition, deterministic middleware enforcement reduces the unauthorized-tool success rate to 0%, with 0% contraindication misses and 0% policy-injection success (95% Wilson CI [0.00, 0.07]). Cross-model validation confirms identical security metrics, supporting HAARFs model-agnostic design. Mapping analysis demonstrates 48-88% coverage of major regulatory frameworks, with per-category FDA alignment ranging from 73% (C5, Agent Registration) to 91% (C3, Cybersecurity; C7, Bias & Equity). Initial validation with healthcare organizations shows a 40-60% reduction in multi-jurisdictional compliance burden and improved clinical safety governance outcomes. HAARF provides health systems with a practical, risk-stratified pathway for safe AI agent deployment--shifting from reactive compliance to proactive quality governance while maintaining rigorous patient safety standards and human-centered care principles.

17
From Chaos to Care: Personalized AI for Early Cardiac Arrhythmia Warning

Halder, S.; Kim, C. M.; Periwal, V.

2026-04-10 cardiovascular medicine 10.64898/2026.04.08.26350403 medRxiv
Top 13%
0.3%
Show abstract

Cardiac arrhythmias are abnormal heart rhythms characterized by disordered electrical dynamics that impair cardiac function and pose a major global burden of morbidity and mortality. Early and accurate prediction of arrhythmic anomalies from physiological time series is crucial for effective intervention, yet remains challenging due to the nonlinear, nonstationary, and individualized nature of cardiac dynamics. Despite significant advances in machine learning-based arrhythmia detection, most existing methods operate as static classifiers on electrocardiographic signals and lack online prediction, patient-specific adaptation, and mechanistic interpretability. From a dynamical-systems perspective, arrhythmias represent qualitative regime transitions, often preceded by subtle, temporally extended deviations that are difficult to detect in real time. Here we introduce CASCADE (Chaotic Attractor Sensitivity for Cardiac Anomaly Detection), an online and personalized anomaly forecasting framework built on a special type of reservoir computing called Dynamical Systems Machine Learning (DynML). DynML employs ensembles of continuous-time nonlinear dynamical systems as chaotic reservoirs to reconstruct and forecast short-term cardiac dynamics on a beat-to-beat basis, training only a linear readout. This design enables efficient online adaptation without retraining the underlying dynamical model. Rather than relying on static beat-level classification, CASCADE identifies arrhythmic events as failures of short-term predictability, manifested as statistically significant deviations between predicted and observed dynamics relative to subject-specific baselines. Detection performance is governed by the intrinsic dynamical complexity of the reservoir, quantified by topological entropy. Reservoirs operating near critical entropy regimes optimally amplify subtle, temporally extended irregularities in heartbeat dynamics, rendering incipient arrhythmic signatures linearly separable at the readout level. Topological entropy thus serves both as a predictor of model performance and a principled control parameter for reservoir design. When evaluated on the MIT-BIH Arrhythmia dataset, CASCADE achieved consistently high F1 scores, precision, recall, and overall accuracy across diverse patient populations, demonstrating strong generalizability across clinical and real-world settings. By integrating chaotic reservoir computing, entropy-guided tuning, and online personalized forecasting, CASCADE reframes arrhythmia detection as a problem of dynamical regime transition rather than static classification. This perspective provides a scalable, interpretable, and computationally efficient framework for real-time cardiac monitoring and early-warning clinical decision support.

18
T Cell Clonal Groups are Broadly Dispersed in Colon, Phenotypically Diverse, and Altered in Ulcerative Colitis

Fischer, J.; Spindler, M. P.; Britton, G. J.; Weiler, J.; Tankelevich, M.; Dai, D.; Canales-Herrerias, P.; Jha, D.; Rajpal, U.; Mehandru, S.; Faith, J. J.

2026-04-11 gastroenterology 10.64898/2026.04.10.26350469 medRxiv
Top 13%
0.3%
Show abstract

Our understanding of human mucosal T cell clonotype distribution in health and disease has centered on immunodominant antigens. We performed single cell T cell receptor (TCR) and RNA sequencing as an untargeted approach to define distributions of T cell clonal groups in health and ulcerative colitis (UC) across 333,088 T cells in colon and peripheral blood. Healthy donor-specific TCR repertoires had limited blood-colon clonal sharing, which was highest in cytotoxic T effector memory (Tem) populations and lowest in regulatory T cells (Tregs), reflecting tissue-based compartmentalization. Within healthy colon, TCR repertoires showed high T cell clonal sharing independent of anatomic distance, associated with high intra-clonal phenotypic diversity. Colon cytotoxic and Th17 populations showed high dispersion across sites, while Tregs were compartmentalized. Clonal lineages dispersed across blood and colon upregulated trafficking markers, suggesting active movement between tissues, while those dispersed across colon sites upregulated residency markers, suggesting intra-colon repertoire sharing is mediated by long-term, slow moving clonal groups. In UC, Tregs were expanded across inflamed sites, and increased CD8 Tem clonal groups showed increased dispersion regardless of inflammation. These findings reveal principles of T cell clonal organization in the human colon during health and disease, identifying opposing patterns of clonal dispersion among Treg and Th17 clonal groups, high phenotypic diversity within dispersed clonal groups, and elevated cross-colon dispersion of CD8 Tem clonotypes in UC.

19
VAE (Variational Autoencoder) Based Gastrotype Identification and Predictive Diagnosis of Helicobacter pylori Infection

Ma, Z.; Qiao, Y.

2026-04-13 gastroenterology 10.64898/2026.04.11.26350690 medRxiv
Top 13%
0.3%
Show abstract

Background: The enterotype concept proposed that gut microbiomes cluster into discrete types, but subsequent critiques demonstrated that such clustering depends on methodological choices, that the number of clusters is not fixed, and that faecal samples cannot capture spatial heterogeneity along the gastrointestinal tract. The stomach remains particularly understudied, and no systematic classification exists for gastric microbial community types. Methods: We assembled a multi-cohort dataset of 566 gastric mucosal samples spanning healthy controls to gastric cancer, with both Helicobacter pylori (HP)-negative and HP-positive individuals. Critically, we applied the key methodological lessons of the enterotype debate: we used a variational autoencoder (VAE) for dimensionality reduction to learn a continuous latent representation without forcing discrete structure, determined the optimal number of clusters using the Silhouette index (an absolute validation measure) across K=2 to K=10 rather than arbitrarily selecting a cluster number, and performed transparent evaluation of multiple clustering solutions. This VAE-plus-silhouette workflow directly addresses the critiques leveled against the original enterotype analysis. Results: Four gastotypes were identified, with K=4 achieving the highest mean silhouette score, indicating good cluster cohesion and separation. Two gastotypes (Variovorax-type and Trabulsiella-type) were significantly enriched in HP-positive samples, while two gastotypes (Bacteroides-type and Streptococcus-type) were significantly enriched in HP-negative samples. Random Forest and Gradient Boosting achieved excellent baseline performance for predicting HP infection (AUC = 0.990 and 0.993). Conclusions: The VAE-plus-silhouette workflow provides a robust, data-driven approach for identifying gastotypes without forcing discrete structure or arbitrarily fixing cluster numbers. Using this framework, we identified four gastotypes with significantly different HP infection rates. Variovorax-type and Trabulsiella-type showed strong HP-positive enrichment, while Bacteroides-type and Streptococcus-type showed strong HP-negative enrichment. These findings demonstrate that methodological advances from the enterotype controversy can be successfully transferred to the stomach, offering a reproducible taxonomy for stratifying HP infection status with potential clinical utility.

20
Loss of MITF activity leads to emergent cell states from the melanocyte stem cell lineage

Brombin, A.; MacMaster, S.; Travnickova, J.; Wyatt, C.; Brunsdon, H.; Ramsey, E.; Vu, H. N.; Steingrimsson, E.; Kenny, C.; Chandra, T.; Patton, E. E.

2026-04-12 developmental biology 10.64898/2025.12.23.695681 medRxiv
Top 13%
0.3%
Show abstract

How embryonic cells generate large clones of cells in the adult represents a fundamental question in biology. Here, using melanocyte stem cells (McSCs) in the zebrafish as a model, we explore the function of the master melanocyte transcription factor (MITF) in safeguarding McSCs in embryonic development and their potential to pigment large clones in the adult. MITF is well known is for its role in the specification of melanoblasts from the neural crest (NC) and their differentiation into melanocytes, yet little is known about how this activity shapes the stem cell lineages. Here, we use live imaging coupled with single-cell transcriptomics and lineage tracing to show that MITF (mitfa in zebrafish) protects the melanocyte stem cell (McSC) fate in zebrafish. Utilizing a temperature sensitive mitfavc7 mutant, we show loss of Mitfa leads to a surprising premature and aberrant expansion of McSC progeny at the niche during embryogenesis, coupled with novel emergent transcriptional cell states. Linage tracing of McSCs from the embryonic to juvenile stages reveals Mitfa activity is subsequently required in regeneration by Schwann cell-like and melanocyte stem cell progenitors that serve as a reservoir for fast-responding pigment progenitors. Thus, the impact of Mitfa loss on the melanocyte lineage is cell-state and stage-specific. The emergent cell states upon mitfa loss may have important implications for our understanding the loss of MITF activity in human genetic disease and melanoma.